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ABSTRACT 

Using a semi-analytical model developed by Choudhury & Ferrara (2005 1 we study the ob 



servational constraints on reionization via a principal component analysis (PCA). Assuming 
that reionization at z > 6 is primarily driven by stellar sources, we decompose the unknown 
function Ni on (z), representing the number of photons in the IGM per baryon in collapsed ob- 
jects, into its principal components and constrain the latter using the photoionization rate, Tpi, 
obtained from Lya forest Gunn-Peterson optical depth, the WMAP7 electron scattering opti- 
cal depth T e i and the redshift distribution of Lyman-limit systems dN-Lt./dz at z ~ 3.5. The 
main findings of our analysis are: (i) It is sufficient to model Ni on (z) over the redshift range 
2 < z < 14 using 5 parameters to extract the maximum information contained within the 
data, (ii) All quantities related to reionization can be severely constrained for z < 6 because 
of a large number of data points whereas constraints at z > 6 are relatively loose, (iii) The 
weak constraints on Ni on (z) at z > 6 do not allow to disentangle different feedback models 
with present data. There is a clear indication that Ni on (z) must increase at z > 6, thus rul- 
ing out reionization by a single stellar population with non-evolving IMF, and/or star-forming 
efficiency, and/or photon escape fraction. The data allows for non-monotonic Ni on (z) which 
may contain sharp features around z ~ 7. (iv) The PCA implies that reionization must be 
99% completed between 5.8 < z < 10.3 (95% confidence level) and is expected to be 50% 
complete at z » 9.5 — 12. With future data sets, like those obtained by Planck, the z > 6 
constraints will be significantly improved. 
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1 INTRODUCTION 

The importance of studying hydrogen reionization at high red- 
shifts lies in the fact that it is tightly coupled to proper- 
ties of first luminous sources and subsequent galaxy formation 
(for reviews, see, Loeb & Barkana 2001 , Barkan a&Loeb 20011 
|Choudhury & Ferrara 2006a) |Choudhury 20 091. In recent years, 
studies in reionization have been boosted by (i) the availability of 
a wide range of data sets and (ii) the expectation that the volume 
of data would increase rapidly over the next few years (for reviews, 
see |Furlanetto, Oh, & Briggs 2006| |Fan, Carilli, & Keating 2006) l. 
Given such a large amount of data, it is important to develop theo- 
retical and statistical methods so that maximum information can be 
extracted. 

Theoretically, reionization is modelled either semi- 
analytically or by numerical simulations. Unfortunately, the 
physical processes relevant to reionization are so complex that 
neither of the two approaches can capture the overall picture 
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entirely. The simulations are indispensable for understanding 
detailed spatial distribution of ionized regions and topology of 
reionization. However, if one is interested in the evolution of 
globally-averaged quantities, then semi-analytical models prove to 
be very useful in providing insights. The main reason for this is 
that these models can probe a wide range of parameter space which 
can be quite large depending on our ignorance of the different 
processes. 

At present, our understanding of reionization is that it is pri- 
marily driven by ultra-violet radiation from stellar sources form- 
ing within galaxies. The major uncertainty in modelling reioniza- 
tion is to model the star-formation history and transfer of radia- 
tion from the galaxies to the intergalactic medium (IGM) which 
is usually parameterized through Mon, the number of photons 
entering the IGM per baryon in collapsed objects. This param- 
eter, in principle, has a dependence on z which can arise from 
evolution of star-forming efficiency, fraction of photons escap- 
ing from the host halo and chemical and radiative feedback pro- 
cesses. Note that this parameter remains uncertain even in nu- 
merical simulations, hence the semi-analytical models can be- 
come handy in studying a wide range of parameter values and 
the corresponding agreement with data sets. In analytical stud- 
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ies, Ni on {z) is either taken to be a piecewise constant func- 
tion ( [Wyithe & Loeb 20 03 , Choudhur y & Ferrara 2005|> or param- 
eterized using some known functions ( |Chiu, Fan, & Ostriker 2003 [ 
|Pritchard, Loeb, & Wyithe 2010} or modelled using a physically- 
motivated prescription ((Choudhury & Ferrara 2006b 1. In partic- 
ular, a model involving metal-free and normal stars with 
some prescription for radiative and chemical feedback can 
match a wide range of observations (|Choudhury & Ferrara 2 006b ; 
|Gallerani, Choudhury, & Ferrara 2006) and possibly make predic- 
tion regarding search for reionization sources by future experiments 
( |Choudhury & Ferrara 2007) . 

However, the fact remains that many of the physical processes 
involved in modelling iVj on are still uncertain. Given this, it is 
worthwhile doing a detailed probe of the parameter space and deter- 
mine the range of reionization histories that are allowed by the data. 
In other words, rather than working out the uncertain physics, one 
can ask the question as to what are the forms of Ni on (z) implied 
by the data itself. It is expected that in near future, with more data 
sets becoming available, the allowed range in the forms of Ni on (z) 
would be severely constrained, thus telling us exactly how reion- 
ization occurred. Now, it is obvious that the constraints on Ni on (z) 
will not be same for all redshifts, points where there are more and 
better data available, the constraint would be more tight. Similarly, 
since we deal with a heterogeneous set of data, it is expected that 
the constraints would depend on the nature of data used. It is thus 
important to know which aspects of reionization history can be con- 
strained by what kind of data sets. A method which is ideally suited 
to tackle this problem is to use the principal component analysis 
(PC A); this is a technique to compute the most meaningful basis to 
re-express the unknown parameter set and the hope is that this new 
basis will reveal hidden detailed statistical structure. 

In this work, we make a preliminary attempt to constrain 
N lon (z) using PC A and hence estimate the uncertainties in the 
reionization history. The main objective of the work would be to 
find out the widest possible range in reionization histories allowed 
by the different data sets. 

Throughout the paper, we assume a flat Universe with cos- 
mological parameters given by the WMAP7 best-fit values: Q, m = 
0.27, f2 A = 1 - Sim, Slbh 2 = 0.023, and h = 0.71. The pa- 
rameters defining the linear dark matter power spectrum we use are 
cr 8 = 0.8, n s = 0.96, dn s /dlnfc = ILarson et al. 2010b . 



2 SEMI- ANALYTICAL MODEL OF REIONIZATION 
2.1 Features of the model 

The semi-analytical model used in this work is based on 
|Choudhury & Ferrara (2005) and |Choudhury & Ferrara (2006bl >. 
Let us first summarize the main features of the model alongwith 
the modifications made in this work: 

• The model accounts for IGM inhomogeneities by adopt- 
ing a lognormal distribution according to the method outlined in 
Miralda-Escude, Haehnelt, & Rees (2000); reionization is said to 
be complete once all the low-density regions (say, with overden- 
sities A < A cr it ~ 60) are ionized. The mean free path of photons 
is thus determined essentially by the distribution of high density 
regions: 



A m fp(z) 



A 



(1) 



[1-F v (z)]^ 

where Fv is the volume fraction of ionized regions and Ao is a nor- 



malization parameter. In our earlier works, the value of this param- 
eter was fixed by comparing with low redshift observations while 
in this work, we treat it as a free parameter. We follow the ioniza- 
tion and thermal histories of neutral, HII and Helll regions simul- 
taneously and self-consistently, treating the IGM as a multi-phase 
medium. 

• The model assumes that reionization is driven by stellar 
sources. The stellar sources can further be divided into two classes, 
namely, (i) metal-free (i.e. PopIII) stars having a Salpeter IMF in 
the mass range 1 — IOOMq : they dominate the photoionization rate 
at high redshifts; (ii) PopII stars with sub-solar metallicities also 
having a Salpeter IMF in the mass range 1 — lOOM© . 

• Reionization by UV sources is accompanied by photo-heating 
of the gas, which can result in a suppression of star formation 
in low-mass haloes. We compute such (radiative) feedback self- 
consistently from the evolution of the thermal properties of the 
IGM. 

• Furthermore the chemical feedback including PopIII— >PopII 
transition is implemented using merger-tree based genetic approach 
Schneider et al. (2006). Under this approach, it is assumed that if 
a given star-forming halo has a progenitor which formed PopIII 
stars, then the halo under consideration is enriched and cannot form 
PopIII stars. In this work, we introduce an analytical formula for the 
transition from PopIII to PopII phase using the conditional proba- 
bility of Press-Schechter mass function (Lacey & Cole 1993[ >. The 
probability that a halo of mass M at z never had a progenitor in the 
mass range [M m i n (z), M + M IBS ] is given by 
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where Af m i n is the minimum mass of haloes which are able to form 
stars and M rcs represents the minimum increase in mass (either by 
accretion or by merger) of an object so that it may be identified 
as a new halo. The fraction of collapsed haloes which are able to 
form PopII and PopIII stars at redshift z are given by the following 
relations: 
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with /coiui(z) + /coii,in(z) = /con (2). The quantity pm is the 
comoving density of dark matter and dn/dM is number density 
of collapsed objects per unit comoving volume per unit mass range 
dPress & Schechter 1974V 

• Given the collapsed fraction, this model calculates the produc- 
tion rate of ionizing photons in the IGM as 
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where m is the total baryonic number density in the IGM and 
Afion.n (A^on.iii) is the number of photons from PopII (PopIII) stars 
entering the IGM per baryon in collapsed objects. The parameter 
iVion can actually be written as a combination of various other pa- 
rameters: 
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where e* denotes the star-forming efficiency (fraction of baryons 
within collapsed haloes going into stars), / esc is the fraction of 
photons escaping into the IGM, [dN^ /dM,] gives the number of 
photons emitted per frequency range per unit mass of stars (which 
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depends on the stellar IMF and the corresponding stellar spec- 
trum) and e = e*/ esc . For PopII stars with sub-solar metallici- 
ties having a Salpeter IMF in the mass range 1 — lOOAf©, we get 
Mon.n ~ 3200en, while for PopIII stars having a Salpeter IMF in 
the mass range 1 — lOOAf©, we get Num,m ~ 35000em. 

In this Section, we take en, em (or, equivalently 
JVion.n, ATionjn) to be independent of z and M, which im- 
plies that the star-forming efficiencies and the escape fractions 
do not depend on the mass of the star-forming halo and also do 
not evolve. However, note that the effective Ni on (which is the 
appropriately weighted average of Mon,n and Ni OUi iu) evolves 
with z 



N iou (z) = 



d/coll 



J 2 — + iVionJH 
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d /coll,II _j_ d /coll,III 
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dt 1 dt 

At high redshifts, we expect d/ co ii,ii/dt — > 0, hence Ni on (z) — > 
Mon,in, and similarly at low redshifts where chemical enrichment 
is widespread, we have Ni on (z) — > Ni on .u. 

• We also include the contribution of quasars based 
on their observed luminosity function at z < 6 
(Hopkins, Richards, & Hernquist 2007 1; we assume that they 
have negligible effects on IGM at higher redshifts. They are 
significant sources of photons at z < 4 and are particularly 
relevant for studying helium reionization. 

• The free parameters for this analysis would be en, em (or, 
equivalently iVion.n, Mon.lll) and Ao, the normalization which de- 
termines the mean free path of photons. 

• Usually, the model is constrained by comparing with a variety 
of observational data, namely, (i) redshift evolution of Lyman-limit 
absorption systems (LLS), (ii) IGM Lya and Ly/3 optical depths, 
(iii) electron scattering optical depth, (iv) temperature of the mean 
intergalactic gas, and (v) cosmic star formation history. However, 
most of the constraints on the model come from a subset of the 
above data sets. In this work, we would like to carry out a detailed 
likelihood analysis of the parameters. Hence to keep the analysis 
simple, the likelihood analysis is done using only three particular 
data sets which are discussed as follows: 

(i) We use estimates for the photoionization rates Tpi 
obtained using Lya forest Gunn-Peterson optical depth ob- 
servations and a large set of hydrodynamical simulations 
dBolton & Ha ehnelt 2007). The error-bars in these data points take 
into account the uncertainties in the thermal state of the IGM in 
addition to the observational errors in the Lya optical depth. The 
data points have a mild dependence on the cosmological param- 
eters which has been taken into account in this work. We also 
find that although the error-bars on Tpi are highly asymmetric, 
those on log(rpi) are relatively symmetric; hence we use values 
of log(Tpi) and the corresponding errors in our likelihood anal- 
ysis. The photoionization rate can be obtained in our model from 
hph(z) using the relation 



TpiO) = (1 + z 



p oo 
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where v the frequency of radiation, j/hi is the threshold frequency 
for photoionization of hydrogen and oh iy) is the photoionization 
cross section of hydrogen. 

(ii) The second set of observations we have used corresponds 
to the WMAP7 data on electron scattering optical depth r e i 
dLarson et al. 20101 . The reported value of this quantity depends 
on the background cosmological model used. In this work, we 
restrict ourselves to the flat CDM universe with a cosmological 
constant and use the corresponding constraints on r e i. Also, the 



Tei constraint is treated as a single data point which should be 
thought as a simplification because CMB polarization observations 
are, in principle, sensitive to the shape of the reionization history 
(Burigana et al. 2008). However, we have checked and found that 
the range of reionization histories considered in this paper would 
hardly make any difference to the currently observed large angu- 
lar scale polarization anisotropies other than the value of r e i. The 
quantity r c i can be obtained from our model given the global reion- 
ization history, in particular the comoving density of free electrons 
n e (z): 



:[t] 

T c i(z) = a T c I dt n e (1 + z) 



(8) 



where ot is the Thomson scattering cross section. 

(iii) Finally, we use the redshift distribution of LLS diVix /dz at 
z ~ 3.5 jProchaska, O'Meara, & WorseckTO lOl^The data points 
are obtained using a large sample of QSO spectra which results in 
extremely small statistical errors. However, there are various sys- 
tematic effects arising from effects like the incidence of proximate 
LLS and uncertainties in the continuum. Usually, these effects con- 
tribute to about 10-20% uncertainty in the data points. The quantity 
dN^/dz can be calculated in our model from the mean free path: 

d7V LL c 



dz X m{p (z)H(z)(l + z) 



(9) 



Note that inclusion of the Lyman-limit systems in the analysis is 
crucial for constraining the parameter Ao. 

The likelihood function used in our calculations is given by 



L oc exp(— £) 



(10) 



where C is the negative of the log-likelihood. It is estimated using 
the relation 



-| «DbB 



(11) 



where Q a represents the set of n bs observational data points de- 
scribed above, i.e., Q a = {log(rpi), r e i, dN^/dz} and a a are 
the corresponding observational error-bars. We constrain the free 
parameters by maximizing the likelihood function. We impose a 
prior such that reionization should be complete by z = 5.8, oth- 
erwise it will not match that Lya and Ly/3 forest transmitted flux 
data. 



2.2 Reionization Constraints 

The results of our likelihood analysis using the reionization model 
described above are summarized in Table[T] The evolution of vari- 
ous quantities for models which are allowed within 95% confidence 
limit is shown in FigureQ] 

The top-left panel of the figure shows the evolution of the ef- 
fective iVj on as given by equation ((6). One can see that the quantity 
attains a constant value « 10 at z < 6 which is a consequence of 



1 We did not include the more recent measurements of dN^/dz by 
|Songaila & C owie (2010) because the values are systematically larger than 
the ones quoted in[Prochaska, O'Meara, & Worseck (2010) at z ~ 3.5; in- 
clusion of both the data sets would lead to a bad fit for the model. The 
Songaila & Cowie (2010) set has a data point at z ~ 6 which is not present 
in other data sets, however the present error-bar on that particular point is 
relatively large and hence excluding it does not affect our constraints sig- 
nificantly. 
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Figure 1. The marginalized posteriori distribution of various quantities related to reionization history for a model with chemical feedback (Choudhury & 
Ferrara 2006). The solid lines correspond to the model described by mean values of the parameters while the shaded regions correspond to 2-cr limits. The 
points with error-bars denote the observational data points. Top-left: the evolution of the effective A r ; on (z); Top-middle: the hydrogen photoionization rate 
rpi(z) alongwith the constraints from Bolton & Haehnelt (2007); Top-right: the LLS distribution dN^/dz with data points from Prochaska, O'Meara & 
Worseck (2010); Bottom-left: the electron scattering optical depth r e j with the WMAP7 constraint (Larson et al. 2010); Bottom-middle: the volume filling 
factor of HII regions Qhii(z); Bottom-right: the global neutral hydrogen fraction xhi(z) with observational limits from QSO absorption lines (Fan et al. 
2006; filled square), Lya emitter luminosity function (Kashikawa et al. 2006; open triangle) and GRB spectrum analysis (Totani et al 2006; open square). Also 
shown are the constraints using dark gap statistics on QSO spectra (Gallerani et al 2008a; open circles) and GRB spectra (Gallerani et al. 2008b; filled circle). 



Parameters Mean value 95% confidence limits 



QI 




0.003 


[0.001,0.005] 


CIII 




0.020 


[0.000, 0.043] 


Ao 




5.310 


[2.317,9.474] 


2(<2hii 


= 0.5) 


9.661 


[7.894, 11.590] 


z(<2hii 


= 0.99) 


6.762 


[5.800, 7.819] 



Table 1. The marginalized posterior probabilities with 95% C.L. errors of 
all free parameters (top three parameters) and derived parameters (from the 
fourth parameter down) for the reionization model with PopII and PopIII 
stars. 



the fact that the photon emissivity at those epochs are purely de- 
termined by PopII stars. However at higher redshifts, the value of 
7Yi on increases with z because of the presence of PopIII stars. It is 
clear that the data cannot be fitted with PopII stars with constant 
Afion.n alone, one requires a rise in A?i on at higher redshifts. For 
the kind of chemical feedback employed in the model, the rise is 
rather smooth and gradual. 

The mean values of parameters quoted in Table Q] are simi- 
lar to the best-fit model described in Choudhury & Ferrara (2006b I 
and hence the corresponding reionization history is similar to those 
described in the same paper. This can be readily verified from Fig- 
ure Q] where we see that reionization starts around z w 15 driven 
by PopIII stars, and it is 90 per cent complete by z w 7.5. Af- 
ter a rapid initial phase, the growth of the volume filled by ionized 
regions slows down at z < 10 due to the combined action of chem- 
ical and radiative feedback, making reionization a considerably ex- 
tended process completing only at z w 6. We refer the reader to 
our earlier papers for a discussion of this model. Our likelihood 



analysis shows that reionization is 50 (99) % complete between 
redshifts z =7.9 - 1 1.6 (5.8 - 7.8) at 95% confidence level. Hence, 
under the assumptions made in the model, we find that completion 
of reionization cannot occur earlier than z ~ 8, essentially ruling 
out models of very early reionization. The reason for this is that 
the number of photons in the IGM at z = 6 is very low as implied 
by the Lytv forest data. In order to take the data point into account, 
the models typically cannot have too high a emissivity at z ~ 6. 
On the other hand, the constraints on t c \ imply that reionization 
must be initiated early enough. Thus the IGM has to go through 
a gradual reionization phase. As we discussed above, the gradual 
reionization is maintained by a combined action of radiative and 
chemical feedback effects. 

Interestingly, we find that a couple of data points for dA^L /dz 
lie above the 2-cr limits of our analysis. In models where these 
points agree with the data, the photon mean free path A m f p , and 
hence the photoionization rate Tpi, are relatively smaller. These 
lead to larger GP optical depths which then violate the Lya forest 
constraints. This discrepancy can arise either (i) because of some 
unaccounted systematics present in the data or (ii) from the simpli- 
fying assumptions made in our models for calculating A m f p . The 
actual reason needs to be investigated further. 



3 PRINCIPAL COMPONENT ANALYSIS 
3.1 Motivation 

It is most likely that the star-forming efficiencies and escape frac- 
tions and hence Ni on are functions of halo mass and redshift; how- 
ever since the dependencies are not well understood, they were 
taken to be constant for each considered stellar population in the 
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previous Section. The question one can ask is that how would the 
constraints on reionization histories of the previous Section change 
when the evolution of TVion is taken into account. Ideally one would 
like to do a rigorous likelihood analysis with N[ on varying with 
z and see the possible ranges of reionization histories consistent 
with available data. One possible approach could be to parameter- 
ize iVion(z) using some (known) function and constrain the param- 
eters of the function (Pritchard, Loeb, & Wyithe 2010). However, 
it is possible that the reionization constraints thus obtained could 
depend on the nature of the function chosen. In addition, it is not 
clear as to how many parameters should be used to parameterize 
the function. 

An alternative approach is to assume Nion (z) to be completely 
arbitrary and decompose it into principal components. These prin- 
cipal components essentially filters out components of the model 
which are most sensitive to the data. Obviously, these compo- 
nents are the ones which can be constrained most accurately, while 
the others cannot be done so. This principal component analysis 
(PCA), thus, should give an idea as to which aspects of -/Vj on can 
be constrained with available data. This implies that one should get 
a clear idea about the optimum number of parameters required to 
model JVi on to fit the data most accurately. 

In order to carry out such analysis, we modify the model de- 
scribed in the previous Section in following respects: 



• We take Nion to be a function of z. Unlike in the previous Sec- 
tion, we do not explicitly assume the presence of two population of 
stars but rather we include only one stellar population; any change 
in the characteristics of these stars over time would be accounted 
for in the evolution of Mon. 

• Clearly, the chemical feedback prescription has to abandoned 
in this model, as there are no two different populations of stars any- 
more. The chemical feedback is rather taken into account indirectly 
by the evolution of Ni on . However, we retain radiative feedback in 
the model given its weak dependence on the specific stellar popu- 
lation properties. 



In recent years there has been a wide use of this method in 
cosmological data analysis. The first set of works were mostly 
related to CMB data where, e.g., |Efstafhiou & Bond (1999} and 
Efstathiou (2002 1 used principal component analysis of CMB 
anisotropy measurements to investigate degeneracies among cos- 
mological parameters. Kadota et al. (2005 ) applied PCA to study 
how accurately CMB observables can constrain inflaton poten- 
tial in a model-independent manner. Leach (2006 ) used PCA tech- 
niques for measuring departures from scale-invariance in the pri- 
mordial power spectrum of density perturbations using cosmic mi- 
crowave background (CMB) Ci data. Mor tonson & Hu (2008| > de- 
veloped a model-independent method to study the effects of reion- 
ization on the large-scale E-mode polarization for any reioniza- 
tion history with the help of principal component analysis fol- 
lowed by the earlier work by Hu & Holder (2003 ). In the con- 
text of weak lensing surveys, Munshi & Kilbinger (2006 1 studied 
the degeneracies between cosmological parameters and measure- 
ment errors from cosmic shear surveys using PCA. The PCA has 
also been employed as an effective tool in the context of type 
la supernova observations to constrain the equation of state of 
dark energy (Huterer & Starkman 2003, Huterer & Cooray 2005 , 
|Crittenden, Pogosian, & Zhao 2009"llClarkson & Zunck el 2010). 



3.2 Basic theory of PCA 

Consider a set of n b s observational data points labeled by 
G a , a = 1, 2, . . . , n bs- Recall that Q a can represent com- 
binations of different data sets, e.g., in our case Q a = 

{log(r P i),T c i,diV LL /dz}. 

Now, let us assume that our model contains an unknown func- 
tion iVk,n(z), which we wish to constrain through observations. We 



can divide our entire redshift interval z m i n 



into (equal) bins 



of width Az and represent iVion(z) by a set of n in discrete free 
parameters 

N lon (zi) = Ni; i = l,2, ...,n bin (12) 
where 



Zi = Z min + (t — l)Az 

and the bin width is given by 



Az 



Tlbin 



(13) 



(14) 



In other words, we have modelled reionization using the value of 
Nion in each redshift bin. We can also include other free parame- 
ters apart from Ni on (zi) in the analysis, like the normalization of 
the mean free path Ao, cosmological parameters etc. However, for 
the moment let us assume that these parameters are fixed (known 
from other observations) and concentrate on Ni on (zi) only. We will 
address the inclusion of other parameters later in this Section. 

The next step is to assume a fiducial model for Ni on (zi), 
which we denote by Nf^lzi). The fiducial model should be cho- 
sen such that it is close to the "true" model. The departure from the 
fiducial model is denoted by 



5N lo n(Zi) = N ion ( Zi ) - N™(zi) EE SN t . 

We can then construct the ribm x n t>in Fisher matrix 



F, 



i og^dg^ 

ol dNi ON, 



(15) 



(16) 



where g^ is theoretical value of g a modelled using the JV, and 
a a is the observational error on g a . The derivatives in the above 
relation are evaluated at the fiducial model Ni — Nf d 

Once the Fisher matrix is constructed, we can determine its 
eigenvalues and corresponding eigenvectors. The principal value 
decomposition is then given by the eigenvalue equation 



(17) 

3 = 1 

where Xk are the eigenvalues and the eigenfunctions corresponding 
to Afc are the fc-th column of the matrix Sik, these are the principal 
components of Ni. They can be thought of a function of z i.e., 

Sik = Sk (•£()• 

The eigenvalues Afe are usually ordered such that Ai > A2 > 
. . . > A„ bin , i.e., Ai corresponds to the largest eigenvalue while 
Anbin the smallest. The eigenfunctions are both orthonormal and 
complete and hence we can expand any function of z as linear com- 
binations of them. In particular we can expand the departure from 
the fiducial model as 



It is worthwhile to mention that any analysis based on the Fisher ma- 
trix Fij , in principle, depends on the fiducial model chosen. The principal 
component analysis, which essentially involves diagonalizing Fij , is thus 
dependent on the choice of N^ d too. In this sense, the PCA is not com- 
pletely model-independent. 
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™bin "bin 

SNi = m k S k { Zi )- m k = ^ 8N ion {zi)S k {zi) (18) 

fc=l i = l 

where are the expansion coefficients with m k = for the fidu- 
cial model. We can now describe our model by the coefficients m k 
rather than the original parameters SNi ■ The advantage is that, un- 
like Ni, the coefficients m k are uncorrected with variances given 
by the inverse eigenvalue: 

(rm nij) = ^-8ij (19) 

The accuracy with which we can determine SNi on at a particular Zi 
is determined by the Cramer-Rao bound 

fe=i Afe 

So, the largest eigenvalues correspond to minimum variance. The 
eigenvalues which are smaller would essentially increase the uncer- 
tainty in determining 8Ni on (zi). Hence, most of the information 
relevant for the observed data points Q a is contained in the first 
few modes with the largest eigenvalues. One may then attempt to 
reconstruct the function SNi on (zi) using only the first M < ribin 
modes: 

AI 

5N m = Y^m k S h (zi). (21) 

fc=l 

However, in neglecting the last ribin — M terms, one introduces 
a bias in determining SNi on (zi). One has to then use a carefully 
chosen M to perform the analysis; the choice usually depends on 
the particular problem in hand. We shall discuss our choice of M 
in the next Section. 

In realistic situations, there will be other free parameters (apart 
from m k or SNi) in the model; these could be, e.g., the normal- 
ization of the mean free path An, cosmological parameters etc. Let 
there be n cxt number of extra parameters other than m k ; this means 
that we are now dealing with a total of n to t = nun + ^cxt parame- 
ters. In this case, we can still form the Fisher matrix of n to t x ntot 
dimensions which can be written as 

where F is the ribin x ribin -dimensional Fisher matrix for the SNi, 
F' is the next x n cxt -dimensional Fisher matrix for the other pa- 
rameters and B is a ribin x n cxt -dimensional matrix containing the 
cross-terms. One can then invert the above J- to obtain the cor- 
responding Hessian matrix T = J r_1 . Following that, one simply 
retains the sub-block T corresponding to SNi whose principal com- 
ponents will be "orthogonalized" to the effect of the other parame- 
ters. The resulting "degraded" sub-block will be lPres s~eTal. 1992t 

F = T _1 = F-BF'~ 1 B T (23) 

In this work we keep the cosmological parameters fixed; how- 
ever we still need to use the above formalism to marginalize over 
Ao- In that case, obviously n cxt = 1. 



4 RESULTS 

The detailed results of our PCA are presented in this Section. 




LO 20 30 



Z 

Figure 2. The Fisher matrix Fij in the z — z plane. 



4.1 Fiducial model 

The first task is to make an assumption for the fiducial model 
N ion (z). The model should match the Tpi and dNhh/dz data 
points at z < 6 and also produce a r e i in the acceptable range. 
Unfortunately, the simplest model with iVion being constant does 
not have these requirements (recall models with only PopII stars 
were disfavoured in the previous Section). We have found earlier 
that the effective 7Vi on should be higher at early epochs dominated 
by PopIII stars and should approach a lower value at z ~ 6 deter- 
mined by PopII stars. In this work we take N^ to be the model 
given by mean values of the free parameters in Section |2~2l 

The choice of this TV^ may seem somewhat arbitrary as there 
could be many other forms of 7Vi on which may match the data 
equally well. We have chosen this to be our fiducial model be- 
cause of the following reasons: (i) it is obtained from a physically- 
motivated model of star formation which includes both metal-free 
and normal stars, (ii) it is characterized by a higher iVi on at higher 
redshifts and hence produces a good match with different observa- 
tions considered in this work, and (iii) the transition from higher to 
lower values is smooth (i.e., there is no abrupt transition or sharp 
features). The final conclusions of this work (to be presented later 
in the Section) would hold true for any fiducial model having these 
three properties (though the actual functional form might be differ- 
ent). The match with the data for our fiducial model is similar to 
Fig. 2 of |Choudhury (2009) . 

We have run the reionization models over a redshift range 
l^min ■ 2 ma x ] = [0 : 30], with a bin width of Az = 0.2. This 
gives ribin = 151. We have checked and found that our main con- 
clusions are unchanged if we vary the bin width between 0.1-0.5. 

The Fisher matrix Fij defined in equation dl6t is evaluated 
at the fiducial model and is shown as a shaded plot in the z — z 
plane in Figure|2] Firstly, the components of the the matrix vanish 
for z < 2 because there are no data points considered at these red- 
shifts. The plot shows different characteristics for Fy at redshift 
intervals 2 < z < 6 and z > 6. For z < 6, the values of Fij are 
considerably higher because it is determined by the sensitivity of 
Tpi and cWll /dz on Mon and it turns out that Tpi is extremely 
sensitive to changes in JVi on . One can see a band-like structure in 
the information matrix which essentially corresponds to the pres- 
ence of data points. The regions where data points are sparse (or 
non-existent, like between z — 2 and 3), the value of Fij is rel- 
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Figure 3. The inverse of eigenvalues which essentially measures the vari- 
ance on the corresponding coefficient m;. For modes larger than 5, the 
eigenvalues are extremely small. 



Figure 4. The first 5 eigenmodes of the Fisher matrix, i.e., S'fc(z); k 
1, ... ,5. 



atively smaller, implying that one cannot constrain JVi on from the 
data in those redshift bins. On the other hand, the information at 
z > 6 is determined by the sensitivity of T e i on JVion ■ Once can see 
that Fij — > at the highest redshifts considered; this is expected 
because the collapsed fraction of haloes is negligible at those red- 
shifts and hence there exist no free electrons to contribute to r e i. 
The precise redshift range at which Fij become negligible depends 
on the (measured) value of r e \. For the WMAP7 measurements, we 
find that Fy is negligible for z > 14; if, e.g., the measured value of 
r e i were higher, Fij would be non-negligible till relatively higher 
redshifts. We can thus conclude that it is not possible to constrain 
any parameters related to star formation at redshifts z > 14 using 
the data sets we have considered in this work. 

Once we diagonalize the matrix Fij, we obtain its eigenvalues 
and the corresponding eigenmodes. The inverse of the eigenvalues, 
which are essentially the variances of the corresponding modes, are 
plotted in Figure[3] Since the eigenvalues Ai are sorted in ascend- 
ing order, the variances are larger for higher modes. For modes 
i > 5, the eigenvalues are almost zero and the variances are ex- 
tremely large. This implies that the errors on JVi on would increase 
dramatically if we include modes i > 5. 

The first 5 eigenmodes which have the lowest variances are 
shown in Figure [4] Clearly, all these modes tend to vanish at 
z > 14, which is because of Fij being negligible at these redshifts. 
Also, modes are identically zero at z < 2 because we have not used 
any data points at these redshifts. The first 4 modes essentially trace 
the sensitivity of Tpi and dN^/dz at z < 6 on the value of JVi on - 
One can see a number of spikes and troughs in these modes whose 
positions correspond to the presence of data points and amplitudes 
correspond to the error-bars on these data points (smaller the error, 
larger the amplitude). The shape of the 5th mode is vary much dif- 
ferent from the previous four. This mode essentially contains the 
behavior of JVion at z > 6 and hence it characterizes the sensitivity 
of r e i on JVion. Since T e \ is obtained by integrating the reioniza- 
tion history over the whole redshift range, the sensitivity covers a 
wide range of redshifts (which is unlike the sensitivity of Fpi). The 



sensitivity is maximum around z ~ 8, which is determined by the 
nature of the fiducial model. The sensitivity falls at z > 8 because 
there is a reduction in the number of sources and free electrons. In- 
terestingly the sensitivity falls at z < 8 too which is due to the fact 
that reionization is mostly complete at these redshifts x e — > 1 and 
hence changing JVi on does not change the value of r e i significantly. 

The modes with smaller eigenvalues have large variances and 
hence introduce huge uncertainties in the determination of JVion. 
The modes are characterized by sharp features at different redshifts 
and they do not contain any significant information about the over- 
all reionization history. 



4.2 Choice of the number of modes 

The next step in our analysis is to decide on how many modes M to 
use. In the case where M = n D i n , all the eigenmodes are included 
in the analysis and no information is thrown away. However, this 
would mean that modes with very small eigenvalues (and hence 
large uncertainties) are included and thus the errors in recovered 
quantities would be large. Reducing M is accompanied by a reduc- 
tion in the error, but an increased chance of getting the recovered 
quantities wrong (which is known as bias). 

It is thus natural to ask what could be the optimum value of 
M for calculations. The most straightforward way, which is used 
often, is to determine it by trial and error, i.e., more and more terms 
are added till one gets some kind of convergence in the recovered 
quantities (Mortonson & Hu 2008 ). Let us first work out the sim- 
plistic trial-and-error approach to fix M and as we shall see that 
this would be helpful in understanding recovery of various parame- 
ters using PCA. We have already discussed that inclusion of modes 
> 5 implies drastic rise in the errors. Hence, it seems that M < 5 
would be a good choice. The question is whether throwing away 
such a large number of modes (n D i n — M) would introduce large 
biases in the recovered quantities. 

In order to examine these issues in more detail, let us assume 
that the underlying "true" form of JVion is very different from the 
fiducial model we have chosen and then try to estimate the errors 
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Figure 5. Recovery of various quantities related to reionization when the input underlying model of N\ OIl {z) is assumed to be a step function (shown by solid 
lines). The extent of recovery is shown when the first 3 (short-dashed lines), 5 (long-dashed lines), 7 (short-long-dashed lines) PCA modes are included in the 
analysis. 



Parameters 



Input value 



M 



Recovered values 
: 3 M = 5 M = 7 



z(Qhii 
z(Qhii = 



= 0.5) 
0.99) 



9.817 
8.337 



9.722 
6.311 



10.004 
7.875 



9.728 
8.037 



Table 2. The recovered quantities for a input model where Mon is repre- 
sented by a step function when only the first M PCA modes are included in 
the analysis. 



we make in recovering this underlying model using only the first 
few modes. In order to put our method to test, it is then natu- 
ral to assume an underlying model which is noticeably different 
from the fiducial one and study its recovery using only the first few 
modes. Recall that the fiducial model represents a smoothly varying 
Mon, so we assume the underlying input model to be one having 
an abrupt transition, e.g., a step function: 



10 for z < 7 
40 for z > 7 



(24) 



The parameters in the model are adjusted so that it matches with ob- 
servations of Tpi and dMx /dz at z < 6 and also gives the correct 
observed value of r c \. The idea would be to check whether we are 
able to recover quantities of interest with reasonable accuracy with 
M = 5. The model chosen above is similar to the abrupt-transition 
model considered in Choudhury & Ferrara (2005 ). 

The results of our analysis are shown in Figure [5] and in Ta- 
ble^ In the figure, we have plotted, as functions of redshifts, the 
four quantities relevant to reionization which we would like to re- 
cover, namely, Mon (top-left panel), the photoionization rate Tpi 
(top-right panel), the volume filling factor of ionized regions Qhii 
(bottom-left panel) and the globally averaged neutral hydrogen 
fraction xui (bottom-right panel). Different curves represent the 
input step model (solid) and the recovered quantities for three val- 
ues of M = 3, 5, 7 (short-dashed, long-dashed, short-long-dashed, 
respectively). We have not shown results for intermediate values 



of M (i.e., M = 4, 6) because the difference between successive 
plots is too small to be noticed. It is clear from the top left panel that 
the recovered Mon is excellent for z < 6 because the fiducial and 
input models agree at these redshifts, which is a manifestation of 
the fact that the value of Mon is highly constrained by good qual- 
ity data points at these redshifts. On the other hand, the recovery 
is quite poor for z > 6. This is because the evolution at z > 6 is 
only weakly constrained by r e i. In particular at z > 14, the modes 
are essentially zero and hence all models tend to the fiducial one 
implying that it is impossible to recover Mon at z > 14 with the 
first few modes. 



The top-middle and top-right panels show the corresponding 
plots for the photoionization rate Tpi and the redshift distribution 
of Lyman-limit systems dM^/dz respectively. The input model 
has a sharp feature around z ~ 7 in both the quantities arising 
mainly from the abrupt step in Mon- The reionization is complete 
(Qhii = 1) at z w 8 after which the photoionization rate rises 
sharply because of overlap of ionized regions and consequent rise 
in mean free path (which manifests itself as a sharp drop in the 
number of LLS). This rise in Tpi is suddenly halted at z = 7 
where we see a sharp decline because of the corresponding step 
decline in Mon- Following that, Tpi settles to a smaller value (cor- 
responding to a smaller value of Mon) and subsequently shows a 
gradual rise arising again from the rise in mean free path. Interest- 
ingly, this feature is completely missing in the recovered model for 
M = 3 (and also for M = 4, not shown in the figure). The fea- 
ture shows up when M is increased to 5, though the exact nature 
of this feature is not identical to the input one. Increasing M to 7 
introduces other sharp features at z ~ 10 which are not present in 
the input model. Of course, the recovery at z > 14 is poor as most 
of the eigenmodes hardly contain any information at these redshifts 
and the recovered models simply follow the fiducial model. Hence, 
the recovery of the photoionization rate and the LLS distribution is 
probably not satisfactory overall, however we can recover it with 
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Figure 6. Dependence of Risk, error and bias as defined in equation (25} 
on the number of modes M. The blow-up of a region around M = 5 is 
shown in the inset which shows that there is a clear minimum in the Risk at 
M = 5. 



reasonable accuracy for z < 12 by considering the first M — 5 
modes. 

The recovery of r e i is shown in the bottom-left panel. It is 
clear that the recovery is good for all values of M. In most reion- 
ization studies, the quantities of main interest are the Qhii and xm, 
which are plotted in the bottom-middle and bottom-right panels re- 
spectively. One can easily see from both the panels that that the 
agreement between the M — 3 case and the input model is quite 
poor (which is the case for M — 4 as well). In particular, reioniza- 
tion is complete at z ~ 8.5 for the input model, while it completes 
only at z ~ 6 for M = 3 case (see Table O. However, the mo- 
ment M is increased to 5, one has a remarkable match with the 
input model, e.g., the difference in Qhii is < 0.1 for z < 12 while 
the difference is < 10% for z < 10. Unfortunately, we cannot re- 
cover the sharp feature in xm around z — 7 for the input model 
(which corresponds to a similar feature in Tpi, discussed above) 
for the M = 5 case, however the overall agreement with the input 
model is still quite good. The agreement is further improved as we 
increase M (to 7 in the plot) but that comes at the cost of increas- 
ing errors. As far as recovering the basic reionization history (i.e., 
evolution of Qhii and xm) is concerned, M = 5 seems to be the 
optimum choice. 

It is important to point out that the recovery of various quan- 
tities related to reionization is good (or excellent, in some cases) 
even when the recovered value of Num is incorrect. This may seems 
surprising as it is the value of Ni on that acts as a source for reion- 
ization. To understand this apparent paradox, note that the recovery 
of JVi on is poor mostly at z > 12. At these redshifts the collapsed 
fraction d/ co u/di is typically small, hence the source emissivity 
A^ond/coii/dt — ► at these epochs. Hence even if we change the 
value of TYion, the absolute change in the emissivity is negligible 
and hence the reionization process remains relatively unaffected. 
There is another way of looking at it: the extent of recovery of var- 
ious quantities at z > 6 is determined by the behaviour of PCA 
modes at z > 6 which, in turn, is determined by the data set related 
to r e i. Now r c i is most sensitive to the ionized fraction Qhii (2) at 



z > 6. Hence, it is not surprising that Qhii(z) would be nicely 
recovered at these redshifts. Such arguments can be extended for 
other quantities too. This also brings out the fact that in order to re- 
cover Ni on (z) (and thus star formation, escape fraction and chem- 
ical feedback) reliably, one requires data points at z > 12 related 
to quantities which are sensitive to Nion, like say, hypothetically, 
a good constraint on Tpi at z ~ 12 can constrain Nion at those 
redshifts. 

To summarize our results on recovering the input step model, 
the recovery of all the quantities is excellent for z < 6. We find 
that recovery of ATi on at z > 6 is not satisfactory. The recovery of 
Tpi at z < 12 is quite reasonable by considering the first M — 5 
modes. Fortunately, the recovery of Qhii and xm turns out to be 
excellent for M — 5. Hence we can use the coefficients rm of these 
5 best constrained eigenmodes as our model parameters instead of 
Ni on (zi) without significant loss of information. 

We should mention that the above analysis depends on the 
choice of the input model which is taken to be the step function. 
In fact, the recovery is better if the input Num is a smoother func- 
tion (provided it satisfies the observational constraints, of course). 
In particular, all models which are bracketed by the fiducial model 
and the step model would end up giving good agreements for 
Tpi, dA?LL/dz, Tei, Qhii and xni- Of course, if the input models 
have sharp features at some particular redshift(s) z > 6, those fea- 
tures may not be recovered satisfactorily by including only first few 
terms. 

A slightly more formal approach is to estimate M by minimiz- 
ing the quantity Risk, which is defined as iWasserman et al. 2001b 



R = 



j=l i=l \ 



(25) 



The 1st term in the RHS is the bias contribution which arises 
from neglecting the higher order terms, and the 2nd term is the 
uncertainty given by Cramer-Rao bound which rises as higher or- 
der terms (i.e., those corresponding to smaller eigenvalues) are in- 
cluded: 



9 \ 

) > E 



A* 



(26) 



However, the calculation of Risk, as defined above, involves as- 
sumption of an "underlying model", hence the determination of M 
using this method would be model-dependent. Let us assume the 
underlying model to be the same as equation i24l . Then the depen- 
dence of the Risk on the number of modes M is shown in Figure 
[6] In addition, we also show the plots of bias [first term of the rhs 
in equation l |25H and the error [second term of the rhs in equation 
are also shown. It is clear that the value of error is small for 
lower M which is a direct consequence of small eigenvalues. The 
error shoots up drastically for M > 5 which is what we discussed 
in the previous Section. On the other hand, the bias is higher for 
small M and decreases gradually as more and more terms in the 
summation are included. The Risk, which is the sum of these two 
quantities, has a clear minimum at M = 5 (which is more clear 
from the inset in Figure[6j. Hence we conclude that M = 5 is the 
optimum value to be used. 

The main conclusion of this Section is that one needs five 
parameters to describe the reionization history which can be con- 
strained with the data considered in this paper. Out of these five, 
four parameters are required to describe the emissivity at z < 6 
where most of the data points exist; these parameters are the best- 
constrained ones. The fifth parameter characterizes the evolution 
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Parameters 


Mean value 


95% confidence limits 


mi 


0.002 


[-0.002,0.018] 


m 2 


0.007 


[-0.001,0.032] 


TO3 


-0.003 


[-0.012,0.004] 


1714 


0.003 


[-0.003,0.015] 


ms 


-0.065 


[-0.276,0.003] 


A 


4.450 


[3.245, 5.906] 


z{Qmi = 0.5) 
z(Qhii = 0.99) 


10.349 
8.357 


[9.528,11.585] 
[5.800, 10.270] 



Table 3. The marginalized posterior probabilities with 95% C.L. errors of 
all free parameters (top six parameters) and derived parameters (from the 
seventh parameter down) for the reionization model with principal compo- 
nent analysis. 



of -/Vion at 2 > 6 and is essentially determined by the WMAP con- 
straints of r c i. Inclusion of more parameters would lead to overfit- 
ting of the data and hence the constraints on the parameters would 
be highly uncertain. 



4.3 Constraints on reionization history 

The constraints on reionization are obtained by performing a 
Monte-Carlo Markov Chain (MCMC) analysis over the parameter 
space of PC A amplitudes {mi, ... , m^} and Ao. The cosmological 
parameters are kept fixed to the WMAP7 best-fit values. In order 
to carry out the analysis, we have developed a code based on the 
publicly available COSMOMC0 dLewis & Bridle 2002t (which is 
widely used for running MCMC on CMB and other cosmological 
data sets). To get accurate results from MCMC, we ensure that the 
parameter chains contain enough independent samples over a suf- 
ficiently large volume of parameter space so that the density of the 
samples converges to the actual posterior probability distribution. 
We run a number of separate chains (varying between 5 to 10) until 
the Gelman and Rubin convergence statistics, R, corresponding to 
the ratio of the variance of parameters between chains to the vari- 
ance within each chain, satisfies R— 1 < 0.01. 

The mean values and the 95% confidence limits on our param- 
eters obtained from our analysis are shown in Table|3] Our fiducial 
model mi = ni2 = = 1714 = 7715 = is included within 
the 95% confidence limits of the parameters corresponding to the 
eigenmode amplitudes, however the mean values show clear depar- 
tures from the fiducial model. This implies that the model charac- 
terized by the mean values of parameters, loosely mentioned as the 
"mean model" hereafter, is different from the fiducial one. 

In order to see how different it is, we show the evolution of 
various quantities related to reionization is shown in Figure [7] The 
solid lines represent the mean model while the shaded region cor- 
respond to 95% confidence limits. For comparison, we have also 
plotted the fiducial model (short-dashed) and the step model (long- 
dashed) which was introduced in Section l4~2l We find that the fidu- 
cial model is within the 95% confidence limits for the whole red- 
shift range, while the step model is within the 95% confidence lim- 
its for 2 < 10. Also note that the fiducial model is actually near 
the edge of the shaded region, implying that there is a wide range 
of models allowed by the data which are characteristically different 
from the fiducial model. 



3 http://cosmologist.info/cosmomc/ 



The next point to note is that all the quantities are highly con- 
strained at 2 < 6, which is expected as most of the observational 
information related to reionization exists only at those redshifts. 
The errors also decrease at 2 > 12 as there is practically no infor- 
mation in the PCA modes and hence all models converge towards 
the fiducial one. This implies that early stages of reionization are al- 
most similar independent of the iN/ion chosen. The most interesting 
information regarding reionization is concentrated within a redshift 
range 6 < z < 12. 

It is very clear from the plot of Ni on (z) (top-left panel) that 
such quantity must necessarily increase from its constant value at 
2 < 6. This rules out the possibility of reionization with a single 
stellar population having non-evolving IMF and/or star-forming ef- 
ficiency and/or escape fraction. The value of AT; OI) can be almost 40 
times larger than its value at 2 < 6. Also note that iVi n need not 
be a monotonic function of 2. For example, the mean model, which 
is constant for 2 < 6, shows an increase for 2 > 6 followed by a 
decrease at 2 w 7. The plot shows a subsequent increase around 
z w 11, however one should remember that the information con- 
tained within eigenmodes are severely limited at these epochs. 

From the plot of Fpi(z) (top-middle panel), we find that the 
mean model is consistent with the observational data at 2 < 6, 
as expected. The errors corresponding to 95% confidence limits are 
also smaller at 2 < 6 for reasons discussed above. The photoioniza- 
tion rate for the fiducial model shows a smooth rise at 2 > 6 with a 
peak around 2 « 10, however model described by the mean values 
of the parameters shows a much sharper rise and much prominent 
peak. The location of the peak is around 2 ~ 6.5. The highest 
value of Fpi allowed by the data can be as high as 10" 10 s _1 (95% 
confidence level), which is about 100 times the values typically ob- 
served at 2 < 6. The prominent peak-like structure is also present 
in the dN^/dz (top-right panel). Interestingly, the high-Fpi mod- 
els predict that dN^/dz ~ at 2 ~ 6.5, hence any sighting of 
LLS at these epochs would put more constraints on the models. 

The limits on r e i (bottom-left panel) are, as expected, similar 
to the WMAP7 constraints. We find that the mean r e t is slightly 
higher than the best-fit WMAP7 value because a wide range of 
models with early reionization are allowed by the data. 

The constraints on the reionization history can be seen from 
the plot of Qhii(2:) (bottom-middle panel). The growth of Qhii 
for the fiducial model is somewhat gradual. On the other hand, the 
mean model, which is characterized by sharp peak structures in 
A?i on and Tpi at z > 6, shows a much faster rise in Qhii at ini- 
tial stages, though the completion of reionization takes place only 
at 2 ~ 6. The shaded regions show that reionization can be com- 
plete as early as 2 w 10.5 (95% confidence level). These models 
of early reionization are essentially characterized by high A^ on at 
6.5 < 2 < 10 (so that enough contribution to r e t is achieved to 
match the WMAP7 constraints) followed by a sharp decrease at 
2 < 6.5 so that the emissivity becomes low enough to match the 
photoionization rate obtained from Lya forest data. 

Similar conclusions can be obtained from the plot of xm{z) 
(bottom-right panel). In general, the models allowed by the 95% 
confidence limits are consistent with the available data points 
(shown by points with error-bars). Models of very early reioniza- 
tion (i.e., those with high N- lon at 6.5 < 2 < 10) show sharp 
decrease in xm at 2 « 10 and it can become as low as 10 -6 
at 2 « 6.5. However, the neutral fraction has to increase sharply 
again at 2 < 6.5 (corresponding to sharp decrease in A r i on ) so as to 
match the Lya forest constraints. Thus the evolution of xm is not 
monotonic for these models. On the other hand, models with rela- 
tively smoothly evolving Ni on (ones similar to the fiducial model) 
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Figure 7. The marginalized posteriori distribution of various quantities related to reionization history obtained from the PCA. The different quantities in 
the panels are identical to those in Figure [T] The solid lines correspond to the model described by mean values of the parameters while the shaded regions 
correspond to 2-a limits. In addition, we show the properties of the fiducial model (short-dashed lines) and the step model used in Section POl (long-dashed 
lines). The points with error-bars denote the observational data points which are identical to those in Figure[T] 



show gradual decrease in xm between 6 < 2 < 10 and it smoothly 
matches the Lya forest data. The evolution of the neutral fraction 
is thus monotonic in such models with smoothly evolving ]Vi on . 

If we now go back to the lower portion of Table|3] we find that 
reionization is 50% complete between redshifts 9.6 - 12.0 (95% 
confidence level), while it is almost (99%) complete between red- 
shifts 5.8 - 10.6 (95% confidence level). Note that the lower limit 
on the redshift of reionization (5.8) is imposed as a prior on the 
parameters. 

Thus, the PCA shows that a wide range of reionization his- 
tories is still allowed by the data. Reionization can be quite early 
or can be gradual and late, depending on the behavior of A r i on (z). 
Hence, if one considers only the data we have used, it is practi- 
cally impossible to put any sensible constraints on chemical feed- 
back and/or the evolution of star-forming efficiencies and/or es- 
cape fractions. While this might seem somewhat disappointing at 
the moment, one can hope for much better constraints in near fu- 
ture when the magnitude of data sets are going to rise manifold. 
In fact, in order to keep the analysis simple, we have not used all 
the data sets available. For example, the constraints on the Lya 
and Ly/3 transmitted fluxes now extend beyond 2 = 6 and pos- 
sibly could constrain the models much more. However, our nu- 
merical code takes significantly more time while calculating the 
transmitted fluxes and also there remain uncertainties in the theo- 
retical modelling of the IGM at such redshifts (like the distribution 
of baryonic matter and the scatter in the temperature-density rela- 
tion); hence we have worked simply with the constraints on Tpi. 
Similarly the distribution of LLS at 2 > 6 could also be important 
in ruling out some of the allowed models. At present, there exists a 
data point at 2 w 6 which put limits dN^/Az = 8.91 ± 3.49. On 
the other hand, very high emissivity models predict cWll /dz ~ 
at 2 ~ 6.5. Hence constraints on LLS distribution at 2 ~ 6.5 can 
be helpful in shrinking the allowed parameter space significantly. 

We should also mention that the constraints obtained through 



the PCA are widely different from those obtained using the chem- 
ical feedback model of Section|2]involving PopII and PopIII stars. 
The model in Section [2] uses a particular prescription for chemi- 
cal feedback and assumes constant Mon.n, Mon.m, which results 
in an effective Ni -n(z) which is smoothly evolving and monotoni- 
cally increasing with 2. On the other hand, the models allowed by 
the PCA do not have any physical constraint regarding how -/Vi on 
should evolve. It turns out that in absence of any physical motiva- 
tion, current data does allow for non-monotonic Mon which may 
contain sharp features. Hence it is not surprising that the shapes of 
the allowed models are quite different from the chemical feedback 
models. 



5 DISCUSSION AND SUMMARY 

In this work, we have used a semi-analytical model 
( |Choudhury & Ferrara 2005] |Choudhury & Ferrara 2006b) to 
study the observational constraints on reionization. Assuming 
that reionization at 2 > 6 is primarily driven by stellar sources, 
we have developed a formalism based on principal component 
analysis to model the unknown function Ni on (z), the number of 
photons in the IGM per baryon in collapsed objects. We have used 
three different sets of data points, namely, the photoionization 
rates Tpi obtained from Lya forest Gunn-Peterson optical depth, 
WMAP7 data on electron scattering optical depth r e i, and the 
redshift distribution of Lyman-limit systems dA^LL/d2 at 2 ~ 3.5. 
The main findings of our analysis are: 

• The elements of the Fisher information matrix have larger val- 
ues for 2 < 6 where most of the data points are. There is hardly 
any information at 2 > 14, implying that no information on star- 
formation and/or chemical feedback can be obtained at these red- 
shifts using the available three data sets. 

• To model Monl^z) over the range 2 < 2 < 14 it is necessary 
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to include 5 modes. Using a larger number of modes improves the 
agreement but at the cost of increasing errors. 

• One may not be able to recover the actual form of Nion(z) us- 
ing only these 5 modes, however the recovery of Tpi and diVix /dz 
at 2 < 10 is quite satisfactory and that of Qhii, xm is excellent. 

• It is not possible to match available reionization data with a 
constant 7Vi on over the whole redshift range, i.e. Afj on must increase 
at z > 6. This is a signature of either of a changing IMF induced by 
chemical feedback and/or evolution in the star-forming efficiency 
and/or photon escape fraction of galaxies. The data allows for non- 
monotonic A r i on (2:) (and consequently of xui)- In particular, reion- 
ization histories could show sharp features around z ~ 7. 

• The PCA implies that reionization must be 99% completed 
between 5.8 < z < 10.3 (95% confidence level) and is expected 
to be 50% complete at z w 9.5-12. 

Our analysis provides the widest possible range in reionization 
histories (shown in Fig. 7) allowed by available data sets. It is, in 
some sense, unfortunate that there still exists a wide range of reion- 
ization scenarios that are allowed by the data. While the constraints 
at z < 6 are quite tight, one requires additional data points at z > 6 
to improve constraints on models of feedback and reionization. The 
most obvious addition would, of course, be observation of Gunn- 
Peterson trough in more QSOs at higher redshifts. In parallel, it is 
expected that observations of GRBs and Lya emitters could con- 
strain xm at 2 > 6, which again would result in improved con- 
straints. Finally, observations of large-scale EE polarization signal 
by future CMB probes, like PlancHQ would be extremely important 
in probing the evolution of JVi on at z > 6. Since the constraints 
obtained from the data are still unsatisfactory, there remains am- 
ple scope for developing physically-motivated theoretical models 
which can match a wide- variety of available data. This, in turn, re- 
quires significant improvement in our understanding of processes 
like chemical feedback and also the evolution of star-forming effi- 
ciencies and escape fraction. 
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